trading robust representation
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.46)
Trading robust representations for sample complexity through self-supervised visual experience
Learning in small sample regimes is among the most remarkable features of the human perceptual system. This ability is related to robustness to transformations, which is acquired through visual experience in the form of weak-or self-supervision during development. We explore the idea of allowing artificial systems to learn representations of visual stimuli through weak supervision prior to downstream supervised tasks. We introduce a novel loss function for representation learning using unlabeled image sets and video sequences, and experimentally demonstrate that these representations support one-shot learning and reduce the sample complexity of multiple recognition tasks. We establish the existence of a trade-off between the sizes of weakly supervised, automatically obtained from video sequences, and fully supervised data sets. Our results suggest that equivalence sets other than class labels, which are abundant in unlabeled visual experience, can be used for self-supervised learning of semantically relevant image embeddings.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Research Report > New Finding (0.68)
- Research Report > Experimental Study (0.46)
Reviews: Trading robust representations for sample complexity through self-supervised visual experience
This submission describes a model for unsupervised feature learning that is based on the idea that an image and the set of its transformations (described here as an orbit of the image under the action of a transformation group) should have similar representations according to some loss function L. Two losses are considered. One is based on a ranking loss which enforces examples from the same orbit to be closer than those of different orbits. The other one is a reconstruction loss that enforces that an all examples from an orbit should be mapped to a canonical element of the orbit using an autoencoder-like function. Two broad classes of transformation groups are considered, the first one is a set of parametrized image transformations (as proposed by Dosovitskiy et al. 2016) and the other one is based on some prior knowledge from the data - in this case the tracking of faces in a video (as proposed by Wang et al. 2015). The proposed approach is described with a very clear framework, with proper definitions.
Trading robust representations for sample complexity through self-supervised visual experience
Tacchetti, Andrea, Voinea, Stephen, Evangelopoulos, Georgios
Learning in small sample regimes is among the most remarkable features of the human perceptual system. This ability is related to robustness to transformations, which is acquired through visual experience in the form of weak- or self-supervision during development. We explore the idea of allowing artificial systems to learn representations of visual stimuli through weak supervision prior to downstream supervised tasks. We introduce a novel loss function for representation learning using unlabeled image sets and video sequences, and experimentally demonstrate that these representations support one-shot learning and reduce the sample complexity of multiple recognition tasks. We establish the existence of a trade-off between the sizes of weakly supervised, automatically obtained from video sequences, and fully supervised data sets.
Trading robust representations for sample complexity through self-supervised visual experience
Tacchetti, Andrea, Voinea, Stephen, Evangelopoulos, Georgios
Learning in small sample regimes is among the most remarkable features of the human perceptual system. This ability is related to robustness to transformations, which is acquired through visual experience in the form of weak- or self-supervision during development. We explore the idea of allowing artificial systems to learn representations of visual stimuli through weak supervision prior to downstream supervised tasks. We introduce a novel loss function for representation learning using unlabeled image sets and video sequences, and experimentally demonstrate that these representations support one-shot learning and reduce the sample complexity of multiple recognition tasks. We establish the existence of a trade-off between the sizes of weakly supervised, automatically obtained from video sequences, and fully supervised data sets. Our results suggest that equivalence sets other than class labels, which are abundant in unlabeled visual experience, can be used for self-supervised learning of semantically relevant image embeddings.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > Canada > Quebec > Montreal (0.04)
Trading robust representations for sample complexity through self-supervised visual experience
Tacchetti, Andrea, Voinea, Stephen, Evangelopoulos, Georgios
Learning in small sample regimes is among the most remarkable features of the human perceptual system. This ability is related to robustness to transformations, which is acquired through visual experience in the form of weak- or self-supervision during development. We explore the idea of allowing artificial systems to learn representations of visual stimuli through weak supervision prior to downstream supervised tasks. We introduce a novel loss function for representation learning using unlabeled image sets and video sequences, and experimentally demonstrate that these representations support one-shot learning and reduce the sample complexity of multiple recognition tasks. We establish the existence of a trade-off between the sizes of weakly supervised, automatically obtained from video sequences, and fully supervised data sets. Our results suggest that equivalence sets other than class labels, which are abundant in unlabeled visual experience, can be used for self-supervised learning of semantically relevant image embeddings.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > Canada > Quebec > Montreal (0.04)